Mobility Data Report

Overview

Dataset statistics

Number of records 2834268
Distinct trips 1417134
Number of complete trips (start and and point) 1417134
Number of incomplete trips (single point) 0
Distinct users 378759
Distinct locations (lat & lon combination) 204228

Missing values

User ID (uid) 0
Trip ID (tid) 0
Timestamp (datetime) 0
Latitude (lat) 0
Longitude (lng) 0

Temporal properties

Number of trips over time

2022-01-18T10:52:35.342792 image/svg+xml Matplotlib v3.5.0, https://matplotlib.org/

Distribution

Min. 2018-04-18
25% 2018-04-19
Median 2018-04-19
75% 2018-04-19
Max. 2018-04-20

Number of trips per weekday

2022-01-18T10:52:35.482782 image/svg+xml Matplotlib v3.5.0, https://matplotlib.org/

Number of trips per hour of day split by weekday and weekend

2022-01-18T10:52:35.668390 image/svg+xml Matplotlib v3.5.0, https://matplotlib.org/

Place analysis

Visits per tile

Points outside the given tessellation: 18
2022-01-18T10:52:36.615835 image/svg+xml Matplotlib v3.5.0, https://matplotlib.org/

The following statistics give insights into the distribution of the visits over the tiles (mean, min, max and quartiles) - whether there are tiles that are visited more often than others or if the visits are distributed equally over all tiles.

Distribution

Min. 14.0
25% 2794.0
Median 5597.0
75% 10317.0
Max. 47860.0
A different way of visualizing the distribution of visits over tiles is achieved by the cumuluated sum of all visits: If only a few tiles include most of all visits, the curve has a steep increase in the beginning and a flat part at the end. If the visits are distributed equally over the entire tiles the line is a straight diagonal.
2022-01-18T10:52:36.778463 image/svg+xml Matplotlib v3.5.0, https://matplotlib.org/

Ranking most frequently visited tiles

1 Kantstraße / Schillerstraße (Id: 110202420): 47860
2 Schönhauser Allee / Bornholmer Straße (Id: 110510620): 30424
3 Behmstraße / Stettiner Straße (Id: 110500720): 29222
4 Hausvogteiplatz (Id: 110110320): 26938
5 Südstern (Id: 110401620): 24990
6 Osloer Straße / Seestraße (Id: 110500920): 24368
7 Sonnenallee / Hobrechtstraße (Id: 110407510): 24122
8 Friedrich-Wilhelm-Platz (Id: 110506110): 24112
9 Kantstraße / Suarezstraße (Id: 110402410): 23538
10 Müllerstraße / Seestraße (Id: 110500910): 23398

Visits per tile and time window

Weekday: absolute count

2022-01-18T10:52:38.218283 image/svg+xml Matplotlib v3.5.0, https://matplotlib.org/

Weekday: deviation from average

2022-01-18T10:52:38.901471 image/svg+xml Matplotlib v3.5.0, https://matplotlib.org/

Origin-destination (OD) analysis

OD flows between tiles

2022-01-18T10:52:41.463339 image/svg+xml Matplotlib v3.5.0, https://matplotlib.org/

Intra-tile flows

The number and percentage of flows that start and end within the same tile.

229472.0 (16.19 %) of flows start and end within the same cell.

A large number of intra-cell flows either indicate round-trips (e.g., going running starting and ending at the home location) or a tessellation that is to coarse to properly capture flows.

Distribution

Mean 14.31
Min. 1.0
25% 2.0
Median 4.0
75% 10.0
Max. 4421.0
A different way of visualizing the distribution of number per flows is achieved by the cumuluated sum of all flows: If only a few flows include most of all visits, the curve has a steep increase in the beginning and a flat part at the end. If the visits are distributed equally over the entire flows the line is a straight diagonal.
2022-01-18T10:52:45.193260 image/svg+xml Matplotlib v3.5.0, https://matplotlib.org/

Most frequent OD connections

Ranking most frequent OD connections

1 Kantstraße / Schillerstraße - Kantstraße / Schillerstraße: 4421.0
2 Wilhelmsruher Damm / Senftenberger Ring - Wilhelmsruher Damm / Senftenberger Ring: 3117.0
3 Friedrich-Wilhelm-Platz - Friedrich-Wilhelm-Platz: 2917.0
4 Friedrichshagener Straße / Bahnhofstraße - Friedrichshagener Straße / Bahnhofstraße: 2559.0
5 Schönhauser Allee / Bornholmer Straße - Schönhauser Allee / Bornholmer Straße: 2533.0
6 Müggelseedamm / Fürstenwalder Damm - Müggelseedamm / Fürstenwalder Damm: 2408.0
7 Frankfurter Allee / Petersburger Straße - Frankfurter Allee / Petersburger Straße: 2381.0
8 Fritz-Erler-Allee / Johannisthaler Chaussee - Fritz-Erler-Allee / Johannisthaler Chaussee: 2261.0
9 Südstern - Südstern: 2244.0
10 Ritterfelddamm / Sackrower Landstraße - Ritterfelddamm / Sackrower Landstraße: 2144.0

Trip statistics

Travel time of trips (in minutes)

25883 outliers have been excluded.
Outliers are values above 90
2022-01-18T10:52:45.434441 image/svg+xml Matplotlib v3.5.0, https://matplotlib.org/

Distribution

Min. 4.0
25% 16.0
Median 27.0
75% 43.0
Max. 90.0

Jump length (in kilometers)

481 outliers have been excluded.
Outliers are values above 30
2022-01-18T10:52:45.622752 image/svg+xml Matplotlib v3.5.0, https://matplotlib.org/

Distribution

Min. 0.0
25% 0.9
Median 3.28
75% 6.55
Max. 29.99

User analysis

number of trips per user

2022-01-18T10:52:45.801673 image/svg+xml Matplotlib v3.5.0, https://matplotlib.org/

Distribution

Min. 2.0
25% 2.0
Median 4.0
75% 5.0
Max. 16.0

Time between two consecutive trips of a user

How much time passes between two consecutive trajeoctories? This information gives insights on the temporal density of the dataset. Trajectories might follow each other consecutively, then the time inbetween only is as long as the stay duration at that place. If the trips are only collected sparsely there might be days between single trips of a user.
This analysis is based on the assumption that trips of a user follow each other consecutively and do not overlap, i.e., the start time of a following trip cannot start before the previous one has ended. Therefore, we first perform a plausibility check to ensure that no user trips overlap. Otherwise this might be an indication for a faulty dataset.

Plausibility check: overlapping user trips

There are 1 cases where the start time of the following trip precedes the previous end time.
If there are overlapping trips present in the dataset the minimum time between trips will be negative.

Distribution

Min. -1 days +23:59:00
25% 0 days 00:16:00
Median 0 days 01:15:00
75% 0 days 03:40:00
Max. 0 days 22:20:00

Radius of gyration (in kilometers)

The radius of gyration is the characteristic distance traveled by an individual during a period of time.
2 outliers have been excluded.
Outliers are values above 30
2022-01-18T10:52:45.979250 image/svg+xml Matplotlib v3.5.0, https://matplotlib.org/

Distribution

Min. 0.0
25% 1.4
Median 2.61
75% 4.51
Max. 17.97

Location entropy

Location entropy (based on Shannon Entropy) captures the diversity of user visits. If most trips to a certain location originate from a single (or few) user the entropy is low. A high entropy suggests that the place is visited by diverse users evenly. A dataset with many cells with high visit counts but low entropy suggests, that single users drive certain mobility patterns that might not be representative for other users.
2022-01-18T10:52:46.660338 image/svg+xml Matplotlib v3.5.0, https://matplotlib.org/

Number of distinct tiles per user

How many different tiles does a single user visit?
2022-01-18T10:52:46.821778 image/svg+xml Matplotlib v3.5.0, https://matplotlib.org/

Distribution

Min. 1.0
25% 2.0
Median 3.0
75% 3.0
Max. 12.0

Mobility Entropy

The Mobility Entropy (based on Shannon Entropy) characterizes the heterogeneity of the users visitation patterns (including the historical probability that a location was visited by the user).
2022-01-18T10:52:46.987940 image/svg+xml Matplotlib v3.5.0, https://matplotlib.org/

Distribution

Min. 0.0
25% 0.92
Median 0.96
75% 1.0
Max. 1.0